Experimental determination of Escherichia coli biomass composition for constraint-based metabolic modeling

Genome-scale metabolic models (GEMs) are mathematical representations of metabolism that allow for in silico simulation of metabolic phenotypes and capabilities. A prerequisite for these predictions is an accurate representation of the biomolecular composition of the cell necessary for replication and growth, implemented in GEMs as the so-called biomass objective function (BOF). The BOF contains the metabolic precursors required for synthesis of the cellular macro- and micromolecular constituents (e.g. protein, RNA, DNA), and its composition is highly dependent on the particular organism, strain, and growth condition. Despite its critical role, the BOF is rarely constructed using specific measurements of the modeled organism, drawing the validity of this approach into question. Thus, there is a need to establish robust and reliable protocols for experimental condition-specific biomass determination. Here, we address this challenge by presenting a general pipeline for biomass quantification, evaluating its performance on Escherichia coli K-12 MG1655 sampled during balanced exponential growth under controlled conditions in a batch-fermentor set-up. We significantly improve both the coverage and molecular resolution compared to previously published workflows, quantifying 91.6% of the biomass. Our measurements display great correspondence with previously reported measurements, and we were also able to detect subtle characteristics specific to the particular E. coli strain. Using the modified E. coli GEM iML1515a, we compare the feasible flux ranges of our experimentally determined BOF with the original BOF, finding that the changes in BOF coefficients considerably affect the attainable fluxes at the genome-scale.


Introduction
The increasing availability of large-scale omics data has propelled the study of complex biological systems, pushing the field of systems biology to the forefront of cutting-edge biological acids, RNA, fatty acids, and glycogen in Escherichia coli. While they obtained an impressive overall coverage of the total biomass, the quantification was based on isotope ratio analysis, requiring the cells to be fully labeled with 13 C as well as a supplementation of multiple standard compounds for quantification.
Here, we present a concise pipeline for accurate high-coverage absolute biomass quantification, significantly increasing both the yield and molecular resolution in comparison to previous work. E. coli K-12 MG1655 was grown aerobically in a defined glucose minimal medium using a batch fermentor setup. We evaluated the performance of our pipeline, obtaining high consistency with previously reported values. Specifically, we extend and adjust the workflow of Beck et al. [15] by improving the resolution of the carbohydrate analysis using liquid chromatography UV and electrospray ionization ion trap technique (HPLC-UV-ESI-MS/MS) [23]. Furthermore, we monitored fermentation parameters to ensure stable and controlled conditions throughout the experiment, allowing for biomass sampling in the exponential growth phase. With these enhancements, we obtain an overall mass coverage of 91.6%, proving the pipeline to be an important milestone towards absolute biomass quantification for computational biology applications.
To explore the modelling impact of the BOF generated from our experiments, we used flux variability analysis (FVA) on the iML1515 GEM [24]. Specifically we assessed how, in an aerobic minimal glucose medium, the experimentally determined BOF (hereafter eBOF) differed in predicting feasible flux ranges compared to the BOF included in the iML1515 GEM (hereafter mBOF). eBOF is made in a modified version of the GEM named iML1515a. We find that even though both mBOF and eBOF is supposed to originate from the same experimental conditions, they differ considerably in their coefficient values as well as in their phenotypic predictions of genome-scale flux ranges.

Strain, media and culture conditions
We prepared E. coli strain K-12 MG1655 (700926™, ATCC 1 ) glycerol stock solutions by growing the organism on an LB agar plate and selecting a single colony. The LB agar contained the following: 10 gL −1 peptone, 5 gL −1 yeast extract, 5 gL −1 NaCl, and 15 gL −1 agar. An overnight culture in minimal M9 glucose medium was then aliquoted with 25% glycerol and stored at -80˚C. To prepare the inoculum for the fermentation, an aliquote of the glycerol stock solution was grown overnight in an incubator with shaking (37˚C, 200 rpm) using 100 mL of a standard M9 minimal salts medium with glucose as the sole carbon source in a 500 mL baffled shake flask. The minimal M9 medium had the following composition: 0.4% (w/v) glucose, 1 mM MgSO 4 PO 4 , and 0.2% (v/v) of the same trace mineral solution used in the preculture. The pH electrode was calibrated using a two-step calibration with pH 4 and pH 7 pre-mixed solutions. The dissolved oxygen (DO) electrode was calibrated to 0% by flushing the electrode for 10 min with nitrogen gas, and to 100% dissolved oxygen at 37˚C in the fermentor medium after 30 min with � 500 mL min −1 air inflow at 500 rpm stirring. Stirring was then coupled to DO, ensuring DO�40%.
The off-gas was analyzed with an Eppendorf DASGIP GA4 gas analyzer, allowing for continuous monitoring of the O 2 consumption and CO 2 production. The gas in-and outflow from the fermentor was sterile filtered by employing 0.2 μm filters. The pH was kept constant at pH 7 using 4 M NaOH by automatic titration. Foam production was controlled by the manual addition of silicone-polymer based Antifoam. The bioreactor was inoculated with 1% inoculum. The cells were harvested during exponential growth, centrifuged for 5 min (3645 g, 4˚C), and washed two times in 0.9% NaCl solution, followed by one washing step with MQ water. The pellets were frozen at -80˚C and lyophilized for 3 d. The resulting cell dry mass was then used for the respective protocols.

Medium analysis
The quantification of medium constituent concentrations was performed by NMR, using ERE-TIC2 [25] in the Bruker TopSpin 4.0.8 software. The protocol is based on Søgaard et al. [26]. At specific time points, medium was collected and sterile filtered, and 2.5 mL were stored at -20˚C before lyophilization and re-hydration in 600 μL D 2 O-TSP (0.75%) solution. 500 μL were transferred into a 5 mm NMR tube and analyzed in a 400 MHz (14.7 T) Bruker NMR spectrometer applying the D 2 O solvent setting ( 1 H NMR, noesyggpr1d). The acquisition parameters were 4 dummy scans, 32 scans, SW 21.0368 ppm, O1 1880.61 Hz, TD 65536, TE 300.0 K, D1 4 s, AQ 3.8928385 s, and P1 was calibrated for each sample to ensure accurate quantification. A 70 mM creatine solution (in D 2 O) was used as an external standard, utilizing the singlet at *3 ppm for quantification. As an example, for the glucose quantification the α-Glucose doublet at *5.2 ppm was used, which accounts for 36% of the glucose [27]. The peaks of formate, acetate, succinate, and lactate were identified based on the reference 1 H-NMR spectra available in the Human Metabolome Database (HMDB) [28][29][30] and the software program Chenomx [31], as well as literature by Fan [32].

Protein
The total cellular protein content was measured by acid hydrolysis, followed by amino acid derivatization and quantification by HPLC based on the protocol described by Noble et al. [33]. Aliquots of *10 mg lyophilized cell dry mass were re-hydrated in 5 mL 6 M hydrochloric acid in a 25 mL Schott flask, boiled for 24 h at 105˚C, and allowed to cool to handling temperature before neutralizing with 5 mL 6 M NaOH. The samples were again cooled to handling temperature before sterile filtering. Using different dilutions (see S3 File) of the filtered samples, the amino acids were quantified by reverse-phase HPLC analysis. The samples and standards were derivatized with OPA (o-Phthaldialdehyde Reagent Solution). We used a Waters Nova-Pak C18 4 μm column (3.9x150 mm) with an RF2000 detector set to 330 nm excitation and 438 nm emission wavelength. Further, we employed two mobile phases (phase A: methanol and phase B: 0.08 M CH 3 COONa adjusted to pH 5.90 with concentrated CH 3 COOH and 2% tetrahydrofuran just before usage) with a flow rate of 0. Not all amino acids were directly measured, either due to partial or complete degradation during hydrolysis (e.g. methionine, cysteine). The levels of these amino acids were therefore estimated based on a linear regression of measured amino acid mass fractions and their corresponding prevalence in protein-coding genes. The amino acids with overlapping retention times were treated in the same fashion (i.e. glycine and arginine), as were the levels of glutamine and asparagine which are deamidated to glutamate and aspartate, respectively, during acid hydrolysis [34].

RNA
The cellular proportion of RNA was quantified spectrophotometrically using the protocol described by Benthin et al. [35]. *30 mg of lyophilized biomass were washed three times with 3 mL 0.7 M HClO 4 by vortexing and centrifuging at 2880 g for 10 min at 4˚C, discarding the supernatant between washes. The resulting cell pellet was re-suspended in 3 mL 3 M KOH and incubated in a water bath at 37˚C for 1 h, shaking at 15 min intervals. The samples were cooled and 1 mL 3 M HClO 4 was added before centrifuging at 2880 g for 10 min at 4˚C, decanting the supernatant into a 50 mL polypropylene centrifuge tube. The pellet was washed (re-suspended and centrifuged) twice with 4 mL 0.5 M HClO 4 , before the supernatant was decanted into the 50 mL tube. 3 mL 0.5 M HClO 4 was added to the collected sample and centrifuged to remove any precipitates of KClO 4 . The RNA concentration was measured via UV-visible spectroscopy against the reference solvent using the NanoDrop [36]. The levels of the individual ribonucleotides were estimated based on the monomeric composition of rRNA-encoding genes in E. coli strain K-12 MG1655 (GenBank accession number U00096.3 [37]), as these constitute approximately 81% of the total RNA content in E. coli [38].

DNA
DNA was extracted using the protocol described in Wright et al. [39]. *10 mg lyophilized biomass were dissolved in 600 μL lysis buffer (9.34 mL TE buffer containing 10 mM Tris-Cl (pH 8.0) and 1 mM EDTA (pH 8.0), 600 μL 10% SDS, and 60 μL proteinase K (20 mg mL −1 )) and incubated at 55˚C for 30 min before cooling to room temperature. 600 μL phenol/chloroform (1:1 v/v) were added and mixed well. The samples were centrifuged for 5 min at 12044 g (max speed) in a table centrifuge at room temperature and the upper aqueous phase was transferred to a separate tube. The addition of phenol/chloroform and the subsequent mixing and centrifuging was repeated twice, each round pooling the aqueous phases. An equal volume of chloroform was added to the aqueous phase and the solution was mixed well. The tube was centrifuged for 5 min at max speed in a table centrifuge at room temperature. To precipitate the DNA, the aqueous phase was separated, and mixed gently with 40 μL NaCH 3 COO and 1 mL ice cold ethanol (99%), then incubating at -20˚C for 30 min. The sample was centrifuged for 15 min at max speed in a table centrifuge. The supernatant was discarded and the pellet was rinsed using 1 mL ethanol (70%). The tube was centrifuged for 2 min (table centrifuge, max speed), before carefully discarding the supernatant and air-drying the DNA pellet. The pellet was re-suspended in 50 μL TE buffer and 1 μL RNAase A was added before incubating for 15 min at 37˚C. The concentration of dsDNA was measured by UV-visible spectroscopy using the NanoDrop [36]. The relative distribution of individual deoxyribonucleotides was estimated based on the genome sequence of E. coli strain K-12 MG1655 (GenBank accession number U00096.3 [37]).

Carbohydrate
The total carbohydrate content was measured by order at the Technical University of München in Germany following the protocol described by Rühmann et al. [40]. Briefly, cell dry mass (2 mg L −1 ) was hydrolyzed in 4 M trifluoroacetic acid for 90 min at 121˚C and derivatized with 1-phenyl-3-methyl-5-pyrazolone. The carbohydrate analysis was then performed via HPLC-UV-ESI-MS/MS. This assures high-quality identification and quantification of a wide range of chemically diverse carbohydrate monomers and dimers.

Lipid
Lipids were quantified gravimetrically following a chloroform/methanol extraction protocol [41,42]. *40 mg lyophilized cell dry mass were re-hydrated by adding 0.15 mL water and vortexed briefly at low rpm. The re-hydrated cells were homogenized in a homogenizer at 6500 rpm for 20 s intervals, 2 cycles, along with *0.5 g zirconium beads (1.4 mm) and 0.4 mL methanol. The samples were kept on ice between runs. 0.8 mL chloroform were added before vortexing for 20 min, subsequently adding 0.1 mL water and vortexing again for 10 min. The sample tubes were centrifuged for 4 min at max speed using a table centrifuge, after which the lower chloroform phase was transferred to a separate tube, before repeating the chloroform extraction with 0.6 mL chloroform. Finally, the chloroform was allowed to completely evaporate (� 24 h to 36 h). The total lipid content was quantified by weighing, and corrected using blanks, as well as loss-adjusted for incomplete retrieval during extraction.

Construction of a novel BOF
Using our experimental measurements, we constructed a novel BOF (eBOF) for the E. coli GEM iML1515. For a detailed description of this process, see S2 and S3 Files. Briefly, the stoichiometric coefficients of the existing macromolecular precursors in the iML1515 model BOF, here termed mBOF, were adjusted to reflect our measurements and normalized to a molar mass (g mmol −1 ) of unity. In cases where the precursors themselves were complex biomolecules (e.g. LPS (lipopolysaccharides)), their contents were estimated using our measurements and their molecular composition.

Flux comparison by flux variability analysis
To assess the impact on phenotypic predictions, we performed flux variability analysis (FVA) [43,44] on iML1515 using both mBOF and eBOF as the cellular objectives, with an optimality constraint of 100%. For the sake of simplicity and preventing bias, the modeling was performed using the default exchange rates provided with the model, except that the lower bound on the exchange reaction for cobalamin (EX_cbl1_e) was adjusted from 0 to -1000 mmol gCDW −1 , as the mBOF supplied with the model (and consequently the eBOF) would not grow without.
To compare the resulting minimal and maximal reaction fluxes, we calculated the fractional overlap, ξ, of the corresponding flux range. For a given reaction flux v j with minimal and maximal fluxes α min,j and α max,j for eBOF and minimal and maximal fluxes β min,j and β max,j for mBOF, ξ is defined as We also calculated the relative value of the eBOF coefficients to those of mBOF, S r , given by where α and β are the stoichiometric coefficients (mmol gCDW −1 ) of a given biomass component in eBOF and mBOF, respectively. All simulations were performed in Matlab 2020a [45] using the COBRA toolbox v3 [46] with Cplex [47] as solver.

Batch-fermentation and biomass sampling
To obtain the samples for biomass compositional quantification, we cultured E. coli K-12 MG1655 aerobically in a bioreactor in batch set-up using a defined glucose minimal medium (see section Materials and methods, subsection Strain, media and culture conditions). The cells exhibited a specific growth rate of 0.71 h −1 (generation time 58.7 min), resulting in a cell density of *2.6 gCDW L −1 when sampled during the balanced exponential growth phase at approximately 7.5 h. The corresponding growth curve and time-course fermentation profile is presented in Fig 1 (data from S4 File, full time-course profile can be seen in S4 Fig). E. coli displayed a prototypical respiro-fermentative metabolism with extracellular accumulation of the

PLOS ONE
Experimental determination E. coli biomass composition mixed-acid fermentation products acetate, lactate, formate, and succinate. Following the sampling, washing and subsequent lyophilization, the dry cell mass was analyzed using our pipeline as described in Materials and methods.

Macromolecular biomass composition
The measured biomass composition is presented in Table 1, along with the gold-standard reference values by Neidhardt and the biomass distribution reported by Beck et al. [15,38]. While the data from Beck et al. are biomass measurements of E. coli K-12 MG1655 grown in comparable conditions, the data by Neidhardt contains the biomass profile of E. coli B/r based on a combination of experimental data and estimated data from a collection of literature sources. Although not from the same strain, these latter data are routinely used to construct the BOF of E. coli GEMs [9,24] and are commonly employed as a benchmark to evaluate the quality and coverage of biomass composition quantification [15,19].
Constituting the bulk of overall biomass, both the levels of protein (54%) and RNA (19.0%) were found to be closely comparable to those of Neidhardt (at 55.0% and 20.5%, respectively). The relative distribution of individual ribonucleotides was also close to Neidhardt, although we observe higher levels of CMP, similar to what Beck et al. reported. The quantities of the majority of individual amino acids agreed well with the estimated profile by Neidhardt, whereas the amounts of glutamate/glutamine, glycine, and tyrosine were found to be noticeably lower. Both the levels of lipids (6.1%) and DNA (1.3%) were measured to be slightly lower than the values reported by Neidhardt, a similar finding to that of Beck et al. [15]. The contents of the carbohydrate monomer glucose agreed well with that of Neidhardt, whereas the glucosamine amount was found to be significantly higher. We quantified minor amounts of galactose (0.36%) not reported by Neidhardt, in agreement with well-characterized strain-dependent differences in the composition of the outer core region of LPS in E. coli [49,50]. The absence of any detected rhamnose could also be attributed to these strain-specific variations, or simply that the levels were below the detection limit.

Changes in biomass objective function stoichiometries affects the genomescale metabolism
We initiated the construction of the eBOF by scaling the stoichiometric coefficients of the biomass precursors in iML1515 to reflect our measurements of the biomass composition. Following this, the resulting biomass now accounts for 91.6% of the total cellular dry mass, a marked improvement on recent work on E. coli [15,19]. We combined this subset of quantified biomass components with the compounds from the iML1515 mBOF that were not measured in this pipeline (i.e. inorganic ions, metabolites, cofactors, and coenzymes), normalizing to obtain a molar mass of 1 g mmol −1 .
To assess the change in stoichiometric coefficients of eBOF, we calculated their relative change from mBOF (S r ), as defined in Eq 2. While many biomass coefficients were left unmodified (S r � 1.0), a considerable proportion of biomass components have significantly altered their relative amounts in eBOF (Fig 2A). In fact, we find that *14.8% (w/w) of biomass was reallocated from mBOF to eBOF (see S1 Fig for details). These changes largely coincide with the differences from the biomass measurements of Neidhardt (Table 1). The contents of deoxyribonucleotides, certain amino acids (glycine, glutamate, and glutamine), and lipid components displays the largest decrease, while the peptidoglycan and LPS precursor amounts show the greatest increase.
We compared the minimal and maximal reaction fluxes (FVA, 100% optimality constraint), with mBOF and then eBOF as an objective, to give an unbiased overview of the consequences of change in BOF stoichiometries. For every reaction, we calculated the fractional overlap in flux range (ξ) as defined in Eq 1, the distribution of which is presented in Fig 2B. We see that a significant proportion of reactions have no overlap in flux ranges between the two BOFs (ξ � 0.0). The remaining reactions are largely clustered in two groupings; one where only half   (Table 2). Similarly, the elevated levels of peptidoglycan precursors resulted in a proportional increase in the corresponding biosynthetic flux (e.g. reaction UAGDP, UDP-N-acetylglucosamine diphosphorylase). On the other hand, the reduction in DNA content significantly shifted the flux range of multiple reactions in the nucleotide precursor metabolism ( Table 2). The latter can be exemplified by a stark decline in feasible flux of TMDS (thymidylate synthase) from 0.0218-0.0218 to 0.0098-0.0098 mmol gCDW −1 h −1 necessary for the biosynthesis of dTMP, as well as a corresponding change in flux range for pyruvate synthase from 0.0938-0.0946 to 0.0440-0.0441 mmol gCDW −1 h −1 required for the regeneration of reduced flavodoxin used for converting ribonucleotides into deoxyribonucleotides (Table 2).

Discussion
The BOF is a central pseudo-reaction in constraint-based metabolic models containing the metabolic precursors required for cellular replication and growth [9,15,20,21,51,52].   [38]. The biomass composition of an organism, however, can be highly dynamic and dependent on the strain, growth conditions, growth rate, and growth phase [11,19]. Consequently, exact and reliable quantification is key in order for the model to accurately predict metabolic phenotypes. The pipeline presented here for the quantification of biomass composition addresses this issue. Using samples of E. coli K-12 MG1655 grown in a defined minimal glucose medium, we obtained a total mass coverage of 91.6% under experimental conditions comparable to those of Neidhardt [38]. This is a marked improvement on the mass recovery recently reported by Beck et al. of 64.3% [15]. Improving the overall coverage not only provides a more accurate picture of the presence and distribution of biomass components, but also alleviates the unfavourable effects of loss-adjustment by normalization needed to assemble a functional BOF. This normalization step is necessary as the BOF quantitatively represents the conversion of biomass precursors in mmol to gCDW of biomass, thus the molecular mass of the biomass must be scaled to 1 g mmol −1 to allow for prediction of specific growth rates [10]. Extensive loss-adjustment overestimates the relative proportion of biomass components with a higher recovery. Aiming at maximizing the mass coverage is therefore critical in order to obtain a biomass composition of satisfactory quality for constraint-based metabolic modeling.
We obtain an overall comparable amino acid profile to that reported by Neidhardt, although a few amino acids (e.g. glutamate/glutamine, glycine, tyrosine) were found to deviate significantly ( Table 1). The under-reported quantities of glutamate/glutamine is presumably caused by their conversion to 5-oxoproline at prolonged exposure to high temperatures [53], which was not measured in this experiment. The reason for the observed inconsistency in the levels of glycine and tyrosine is less clear, although Long and Antoniewicz [19] similarly reported noticeably lower levels of glycine relative to Neidhardt.
Although quantified by our HPLC analysis, methionine was only detected in low concentrations, with a lot of variation across repeated measurements. We believe this to be caused by the propensity for thiol-containing amino acids to undergo oxidative deterioration during acidic hydrolysis [54]. This is further substantiated by our inability to detect cysteine, requiring us to estimate the contents of both amino acids based on a linear regression of measured amino acid mass fractions and relative amino acid levels of protein-coding genes. Ensuring complete oxidation of methionine and cysteine by the addition of methanesulfonic acid or performic acid would allow for reliable quantification of the more chemically stable products methionine sulphone and cysteic acid [20,54].
The cellular amounts of DNA should be a fairly stable quantity at higher growth rates [55]. The low levels might therefore hint at inadequate extraction during the DNA quantification. While this can be counteracted by spiking the samples with DNA standards or isotopically labeled DNA to account and correct for procedural sample loss, there remains a high risk of unaccounted-for matrix effects due to the complexity and heterogeneity of biomass. Analytical approaches which do not require initial extraction steps could therefore be considered promising alternatives for quantification of cellular DNA. For instance, Huang et al. proposed a method by which the nucleobase levels of complex cellular material are measured using HPLC following vapor and liquid phase hydrolysis to estimate the overall contents of DNA and RNA [56]. As well as providing a unified platform for the simultaneous quantification of both macromolecules, this approach has the added benefit of directly measuring the levels of individual nucleobases, avoiding the need for biased estimates of their relative distribution from the genome.
No N-acetylglucosamine was detected during our measurements, a carbohydrate that comprise a significant proportion of the peptidoglycan cell wall of prokaryotes [57]. We hypothesize that this is caused by extensive de-N-acetylation of N-acetylglucosamine to glucosamine during the preliminary hydrolysis step [58,59]. This is further indicated by our measurements of excessive amounts of glucosamine. While it is present in other Enterobacteriaceae strains, it is not supposed to be present in the oligosaccharide core of LPS in strain K-12 [60]. The detection of galactose is also in direct accordance with the same strain-specific characteristics, where a single protruding galactose side chain is present in the outer region of the K-12 core type [60]. When constructing eBOF for iML1515, we therefore assumed all detected glucosamine to originate from N-acetylglucosamine, and treated the contents of N-acetylglucosamine and galactose as proxies for peptidoglycan and LPS levels, respectively. Knowing the monomeric stoichiometry of these complex biopolymers, this allowed for seamless integration with the existing BOF of iML1515 by scaling their stoichiometric coefficients based on our experimental measurements (for calculation details, see S2 File). This approach has the added benefit of implicitly quantifying N-acetylmuramic acid in peptidoglycan, and 2-keto-3-deoxy-octonate (KDO) and heptose in LPS, which were not directly measured in our analysis. Accounting for these additional contributions, we end up with an overall carbohydrate content of 7.3%. Our approach therefore entails an improvement in both coverage and molecular resolution, the latter of which is commonly lacking in carbohydrate quantification for the analysis of biomass compositions [15,19,20].
In addition to the carbohydrates listed in Table 1 we measured 5.1% ribose. While this quantity in theory could be employed to verify the levels of RNA [19], the poor stability of ribose at higher temperatures, particularly in strongly acidic conditions [61], hinders its use as a reliable estimate of cellular RNA. The susceptibility of particular carbohydrate monomers to acid-catalyzed thermohydrolysis is therefore an issue that needs to be addressed in future renditions of the pipeline.
After adjusting the BOF coefficients according to the experimental measurements, we arrived at a final mass redistribution of *14.8% between mBOF and eBOF. This suggests that the current BOF for E. coli, based on older measurements and adapted throughout the years, is well suited for simulating exponential aerobic growth on glucose minimal medium. While interesting in their own right, the changes in BOF coefficients provide minimal information on the effects of the resulting predictions of genome-scale metabolic fluxes. By being the penultimate end-point of biochemical transformations in GEMs, alterations in the stoichiometric coefficients of these biomass precursors should propagate throughout the metabolic network and affect the attainable fluxes of the model reactions. We therefore performed FVA for mBOF and eBOF using the default iML1515 uptake parameters to look at the consequences for the metabolism of this redistribution in mass. We observe that the altered BOF stoichiometries considerably impact the range of feasible fluxes in the model. This is evident from the fractional overlap of reaction flux ranges ξ (Fig 2), as well as the relative change in center point of flux ranges (S3 Fig). The latter shows that *46% of the high flux-carrying reactions have changed their center point by more than 10% when using eBOF compared to mBOF. As the biomass compositions of mBOF and eBOF originate from similar experimental conditions, this difference is rather considerable. Additionally, as biomass compositions significantly vary with growth conditions [12], one would expect the impact on model predictions to be even greater when simulating the metabolic phenotype in different environments. This has profound implications for the application of GEMs and emphasizes the importance of conditionspecific biomass measurements when attempting to model a particular scenario.

Conclusion
A detailed and condition-specific BOF is a key element in making accurate predictions of metabolic phenotypes using GEMs. This necessitates high-quality quantification of the biomass composition of the organism in question, and for the experimental condition being modelled. Here, we present a comprehensive analytical pipeline for absolute quantification of the macromolecular biomass composition of E. coli K-12 MG1655 for the construction of strain-and condition-specific BOFs. While rather simple and chiefly relying on well-established protocols for measuring the individual macromolecular classes, we achieved marked improvements in coverage and resolution compared to recently published pipelines. The resulting BOF is made available in the GEM iML1515a.
We applied the experimental pipeline to generate eBOF. The comparison with mBOF revealed a largely similar metabolic phenotype for key attributes like growth and uptake rates, yet the FVA showed a shift in the feasible range of many high-throughput reaction fluxes. Our results therefore highlight the importance of the exact formulation of the BOF, and the need for exact experimental determination for more accurate predictions, even for well-studied organisms such as E. coli under the most standard of conditions. For less-studied organisms, and under more esoteric conditions, one would reasonably expect the impact of a specifically determined BOF to be dramatically higher.
With this work, we address what we regard to be one of the more pressing subjects in the constraint-based metabolic modeling community: the unfortunate tradition of not allocating resources into experimental determination of biomass composition. We have shown that while it remains time-consuming, it is indeed both important and possible to make these measurements. Additionally, such a biomass determination pipeline opens the possibility to generate multiple biomass compositions under different growth conditions for a given organism. It is our hope that the presented protocols and techniques will be further adapted and improved, and that the measurement of biomass composition will become routine.  [unitless], and OD 600 [unitless]. Unity is highlighted on the same axis as the RQ for reference. The final OD 600 was estimated from the measured cell concentration of 2.6 gCDW L −1 at sampling, assuming a cell dry weight to OD 600 conversion factor of 0.5 [15,48]. (B) Concentrations [mM] of the fermentation products lactate, formate, acetate, and succinate. Data used for plotting can be found in S4 File. This is essentially the same plot as Fig 1,